A Near-Optimal Poly-Time Algorithm for Learning a class of Stochastic Games
نویسندگان
چکیده
We present a new algorithm for polynomial time learning of near optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh [ 1998] in reinforcement learning and of Monderer and Tennenholtz [1997] in repeated games. In stochastic games we face an exploration vs. exploitation dilemma more complex than in Markov decision processes. Namely, given information about particular parts of a game matrix, how much effort should the agent invest in learning its unknown parts. We explain and address these issues within the class of single controller stochastic games. This solution can be extended to stochastic games in general.
منابع مشابه
A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
We present a new algorithm for polynomial time learning of optimal behavior in single-controller stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh 5] in reinforcement learning and of Monderer and Tennenholtz 7] in repeated games. In stochastic games, the agent must cope with the existence of an adversary whose actions can be arbitrary. In ...
متن کاملNear-Minimum-Time Motion Planning of Manipulators along Specified Path
The large amount of computation necessary for obtaining time optimal solution for moving a manipulator on specified path has made it impossible to introduce an on line time optimal control algorithm. Most of this computational burden is due to calculation of switching points. In this paper a learning algorithm is proposed for finding the switching points. The method, which can be used for both ...
متن کاملA Near - Optimal Polynomial TimeAlgorithm for Learning in StochasticGames
We present a new algorithm for polynomial time learning of optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh 5] in reinforcement learning and of Monderer and Tennenholtz 7] in repeated games. In stochastic games, the agent must cope with the existence of an adversary whose actions can be arbitrary. In particular, this a...
متن کاملR-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible...
متن کاملSolving a Stochastic Cellular Manufacturing Model by Using Genetic Algorithms
This paper presents a mathematical model for designing cellular manufacturing systems (CMSs) solved by genetic algorithms. This model assumes a dynamic production, a stochastic demand, routing flexibility, and machine flexibility. CMS is an application of group technology (GT) for clustering parts and machines by means of their operational and / or apparent form similarity in different aspects ...
متن کامل